Margins and -ml-

Linear Regression

Introduction

Stata has a very useful command that can be used for the estimation of almost any linear and nonlinear models using maximum likelihood. This command is -ml-.

Properly speaking, this command can be used to obtain any M-type estimator, however, I'll concentrate on maximum likelihood models.

In this small tutorial, I'll provide a small example of how to use -ml- in combination with -margins- to estimate marginal effects of any model of interest, as long as creates identifies the outcome of interest.

The Setup

There are many ways that one can use margins to obtain "correct" marginal effects after the estimation of MLE models. Typical examples use -margins- option "expression". I find using that a bit complicated when the outcome of interest is complex, so I prefer doing it the way Stata does it. Creating a small program that defines the outcome of interest.

If one looks at the information stored in -e()- by all official Stata commands (and most community-contributed commands), you will find there is something stored in e(predict). This "something" is a program that defines the oucome(s) of interest that one can later use for other purposes. Perhaps the most common being obtaining model predictions or marginal effects.

So, in order to use -margins- with -ml-, we need the ability to modify the information stored in e(predict), and redirect -margins- to a command of our own design.

In order to so, I like to have the following program:

. program adde, eclass
  1.         ereturn `0'
  2. end

This is perhaps the simplest program you will ever write, and yet, the most flexible. It basically "adds" any information you want to e().

For example, say that you estimate a simple OLS model, using the dataset auto, and you add some info to e(). Specifically, I'll add a local named "tag" or e(tag) that will say "This is a very simple regression".

. sysuse auto, clear
(1978 Automobile Data)

. reg price mpg

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(1, 72)        =     20.26
       Model |   139449474         1   139449474   Prob > F        =    0.0000
    Residual |   495615923        72  6883554.48   R-squared       =    0.2196
-------------+----------------------------------   Adj R-squared   =    0.2087
       Total |   635065396        73  8699525.97   Root MSE        =    2623.7

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -238.8943   53.07669    -4.50   0.000    -344.7008   -133.0879
       _cons |   11253.06   1170.813     9.61   0.000     8919.088    13587.03
------------------------------------------------------------------------------

. adde local tag "This is a very simple regression"

You can see that this was added to e() by typing -ereturn list-. You will also see that -regress- stores information in e(predict) about a program named -regres_p-, which is used by -margins- and -predict-

. ereturn list

scalars:
                  e(N) =  74
               e(df_m) =  1
               e(df_r) =  72
                  e(F) =  20.25835256291883
                 e(r2) =  .2195828561874974
               e(rmse) =  2623.652888667586
                e(mss) =  139449473.5462301
                e(rss) =  495615922.5753915
               e(r2_a) =  .2087437291901015
                 e(ll) =  -686.5395809065244
               e(ll_0) =  -695.7128688987767
               e(rank) =  2

macros:
                e(tag) : "This is a very simple regression"
            e(cmdline) : "regress price mpg"
              e(title) : "Linear regression"
          e(marginsok) : "XB default"
                e(vce) : "ols"
             e(depvar) : "price"
                e(cmd) : "regress"
         e(properties) : "b V"
            e(predict) : "regres_p"
              e(model) : "ols"
          e(estat_cmd) : "regress_estat"

matrices:
                  e(b) :  1 x 2
                  e(V) :  2 x 2

functions:
             e(sample)

MLE estimator

As I mentioned before, -ml- is a very useful stata command that allows you to obtain MLE estimators give that you define the appropriate objective function.

-ml- has many different options, (lf, df0, df1, df2, etc), each requiring you to define the objective function (usually the log-likelihood function), its gradient, or its Hessian. For all purposes, however, I find -lf- (the simplest one) to be the only one you may ever need.

When you use -ml- with -lf- estimator, you just need to declare the loglikelihood function contributed by each observation in the sample.

In order to ground this concept, lets assume that we want to estimate a simple linear regression using MLE. For this we need to assume that either the error follows a normal distribution, or that the outcome follows a conditionally normal distribution: $$ y_i = x_i \beta + \varepsilon_i$$ $$ y_i \sim N(x_i \beta, \sigma ) $$ or $$ \varepsilon_i \sim N(0, \sigma ) $$

This implies that the (Log)likelihood of a single observation is equal to: $$ L_i = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2} (\frac{y_i-x_i \beta}{\sigma} )^2 } $$ $$ LL_i = -log(\sigma)-\frac{1}{2} log(2\pi) -\frac{1}{2} (\frac{y_i-x_i \beta}{\sigma} )^2 $$ So we just need to write a program that defines this log-likelihood function. This is stright forward:

. program myols_mle
  1.         args lnf xb lnsigma
  2.         local sigma exp(`lnsigma')
  3.         qui:replace `lnf' = log(normalden($ML_y1,`xb',`sigma'))
  4. end

Notice that this program has 3 arguments.

lnf: Will store the LL for each observation
xb: Will store the linear combination for all $ x_i \beta $
lnsigma: Will store the $ log (\sigma) $

Notice here that we are not estimating $\sigma $ directly because it is numerically more stable to estimate its log.

Now the interesting part. Create our "predict" program:

. program myols_mle_p
  1.         syntax newvarname [if] [in] , [ mean sigma emean *]
  2.         if "`mean'`sigma'`emean'" =="" {
  3.             ml_p `0'
  4.         }
  5.         marksample touse, novarlist
  6.         if "`mean'" !=""  {
  7.             tempvar xb
  8.             _predict double `xb' , eq(#1)
  9.                 gen `typlist' `varlist' = `xb' if `touse'
 10.                 label var `varlist' "E(y|X)"
 11.         }
 12.         else if "`sigma'" !=""  {
 13.             tempvar xb
 14.             _predict double `xb' , eq(#2)
 15.                 gen `typlist' `varlist' = exp(`xb') if `touse'
 16.                 label var `varlist' "E(sigma|X)"
 17.         }
 18.         else if "`emean'"!="" {
 19.             tempvar xb lns
 20.                 _predict double `xb' , eq(#1)
 21.                 _predict double `lns' , eq(#2)
 22.                 local sigma (exp(`lns'))
 23.                 gen `typlist' `varlist' = exp(`xb')*exp(0.5*`sigma'^2) if `t
> ouse'
 24.                 label var `varlist' "E(exp(Y)|X)"
 25.         }
 26. end

This program, here named -myols_mle_p-, can be used to estimate 3 things:

mean: This is the standard outcome. Just looking into the linear combination of X and betas
sigma: When this option is used, you will obtain $ \sigma $ .
emean: And this will be something new. -emean- could be used if, say, your outcome of interest was "wages", but you were modeling "log(wages)"

For the last case, -emean-, we are using the expected value for a log normal distribution: $$ log(y_i) \sim N(\mu_y,\sigma_y) $$ $$ E(y_i) = e^{\mu_y}*e^{\frac{1}{2}\sigma^2}$$

The Estimation

Alright Lets see how it works:

First, lets load the data:

. use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta, clear
(Excerpt from the Swiss Labor Market Survey 1998)

Then, estimate the model using -ml-. For this we specify that the "model" will be estimated using "lf" approach, and that the Log-likelihood is defined by the program -myols_mle-.

Afterwards, we declare which variables will be used to model the conditional mean, and just for fun the conditional standard errors. Both parameters will be modelled as functions of education, experience, and gender. This allows for a specific type of heteroskedasticity.

This is the model we will be estimating

$$y_i = \beta_0 + \beta_1 educ + \beta_2 exper + \beta_3 female + \varepsilon_i$$ $$\varepsilon_i \sim N(0,\sigma(x)^2)$$ $$log(\sigma(x))=\gamma_0 + \gamma_1 educ + \gamma_2 exper + \gamma_3 female $$

. ml model lf myols_mle (lnwage = educ exper i.female) (lnsigma:= educ exper i.fe
> male), maximize

initial:       log likelihood = -9602.9223
alternative:   log likelihood = -4263.0081
rescale:       log likelihood = -3318.4554
rescale eq:    log likelihood =  -1777.883
Iteration 0:   log likelihood =  -1777.883  (not concave)
Iteration 1:   log likelihood = -1241.2179  
Iteration 2:   log likelihood = -1029.3117  
Iteration 3:   log likelihood = -876.30431  
Iteration 4:   log likelihood = -871.21438  
Iteration 5:   log likelihood = -871.18998  
Iteration 6:   log likelihood = -871.18998  

. ml display

                                                Number of obs     =      1,434
                                                Wald chi2(3)      =     371.53
Log likelihood = -871.18998                     Prob > chi2       =     0.0000

------------------------------------------------------------------------------
      lnwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
eq1          |
        educ |   .0736371   .0045872    16.05   0.000     .0646464    .0826278
       exper |   .0105825   .0010551    10.03   0.000     .0085147    .0126504
    1.female |  -.1118167    .024106    -4.64   0.000    -.1590636   -.0645698
       _cons |   2.430164   .0641158    37.90   0.000     2.304499    2.555828
-------------+----------------------------------------------------------------
lnsigma      |
        educ |  -.0356134   .0067794    -5.25   0.000    -.0489008   -.0223259
       exper |  -.0165439   .0016363   -10.11   0.000     -.019751   -.0133368
    1.female |   .2066704   .0382432     5.40   0.000     .1317152    .2816256
       _cons |  -.2813734   .0910696    -3.09   0.002    -.4598665   -.1028802
------------------------------------------------------------------------------

You can compare the results with the standard -regress- outcome, or -hetregress- if you want to compare the results allowing for heteroskedastic errors

Now, if we want to use our predict command, we need to "add" it to e()

. adde local predict myols_mle_p

And that is pretty much it. We can now use margins to estimate the effect of education, experience and gender on log of wages (trivial), on the standard errors, or on wages

. margins, dydx(*) predict(mean)  

Average marginal effects                        Number of obs     =      1,434
Model VCE    : OIM

Expression   : E(y|X), predict(mean)
dy/dx w.r.t. : educ exper 1.female

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0736371   .0045872    16.05   0.000     .0646464    .0826278
       exper |   .0105825   .0010551    10.03   0.000     .0085147    .0126504
    1.female |  -.1118167    .024106    -4.64   0.000    -.1590636   -.0645698
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

. margins, dydx(*) predict(sigma)  

Average marginal effects                        Number of obs     =      1,434
Model VCE    : OIM

Expression   : E(sigma|X), predict(sigma)
dy/dx w.r.t. : educ exper 1.female

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |  -.0161816   .0031133    -5.20   0.000    -.0222836   -.0100796
       exper |   -.007517    .000774    -9.71   0.000    -.0090341       -.006
    1.female |   .0938119   .0175522     5.34   0.000     .0594102    .1282136
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

. margins, dydx(*) predict(emean)  

Average marginal effects                        Number of obs     =      1,434
Model VCE    : OIM

Expression   : E(exp(Y)|X), predict(emean)
dy/dx w.r.t. : educ exper 1.female

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   2.175891    .169221    12.86   0.000     1.844224    2.507558
       exper |    .236423   .0394785     5.99   0.000     .1590466    .3137995
    1.female |   -2.27482   .8226334    -2.77   0.006    -3.887152   -.6624884
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

So how do we interpret the results?. Here one possibility:

Each additional year of education increases wages by 7.4%, however, as education increases the dispersion of wages decreases. In terms of actual dollar change, in average, that additional year of education translates in a 2.17\$ per hour increase.